In [18]:
%matplotlib inline

Current WikiMap

wikimap create exited with these statistics:

Graphed a newtork of 11398 nodes and 9344 edges from 26202 unrendered : rendered attribute pairs, spanning 3595 infoboxes

Wikimap Status

Ran the wikimap status command in wikimap/wikimap/ like this:

$ wikimap status ./data-folder/clean_attribute_synonyms_redone.gpickle ./data-folder/infoboxes.xlsx ./data-folder/empty_report.json ./data-folder/explosion_report.json > status.log

with the following excerpted results:

Empty

This includes statistics about infoboxes for which Python WikipediaBase does not provide mappings.

Statistics:
Missing 2339 Infobox templates out of 3595 total, or 65.06%
Missing 430375 Wikipedia pages out of 2607241 total, or 16.51%
Missed template with most pages: [Template:Infobox french commune] with 36791 pages

Explosion

This includes statistics about the largest subgraph (see breakdown of subgraphs by size in "Sample Subgraphs" below).

Statistics:
There are 2131532 Wikipedia pages in the explosion out of 2607241 total, or 81.75%
In order, missed infoboxes with the most pages:
('settlement', 354090.0)
('person', 146915.0)
('album', 122492.0)
('football-biography', 118713.0)
('film', 92138.0)
('musical-artist', 75975.0)
('nrhp', 51425.0)
('company', 50172.0)
('single', 47227.0)
('officeholder', 37139.0)
('book', 30794.0)
('television', 29280.0)
('ship-characteristics', 27818.0)
('military-person', 27662.0)
('ship-career', 25193.0)
('school', 22192.0)
('sportsperson', 19344.0)
('writer', 19319.0)
('scientist', 19272.0)
('radio-station', 18663.0)
('road', 18055.0)
('university', 17891.0)
('football-club', 17065.0)
('mountain', 16137.0)
('station', 16026.0)
('military-unit', 15748.0)
('ice-hockey-player', 14392.0)
('airport', 14094.0)
('cricketer', 13781.0)
('river', 12909.0)
('organization', 12855.0)
('video-game', 12534.0)
('politician', 11483.0)
('planet', 11055.0)
('artist', 10836.0)
('lake', 10244.0)
('building', 9962.0)
('software', 9822.0)
('royalty', 8840.0)
('stadium', 8669.0)
('language', 8021.0)
('football-club-season', 7850.0)
('cyclist', 7407.0)
('television-episode', 7237.0)
('basketball-biography', 6972.0)
('church', 6836.0)
('college-coach', 6630.0)
('protected-area', 6418.0)
('newspaper', 6244.0)
('state-representative', 6030.0)
('journal', 5939.0)
('rugby-league-biography', 5891.0)
('gridiron-football-person', 5845.0)
('rugby-biography', 5690.0)
('rockunit', 5383.0)
('political-party', 5315.0)
('japan-station', 5215.0)
('weapon', 5184.0)
('song', 4960.0)
('automobile', 4903.0)
('military-structure', 4794.0)
('government-agency', 4760.0)
('magazine', 4641.0)
('museum', 4571.0)
('tennis-biography', 4288.0)
('islands', 4224.0)
('prepared-food', 4221.0)
('swimmer', 4052.0)
('congressman', 3929.0)
('award', 3833.0)
('bridge', 3764.0)
('artist-discography', 3732.0)
('artwork', 3675.0)
('airline', 3650.0)
('boxer', 3629.0)
('character', 3624.0)
('historic-site', 3384.0)
('broadcast', 3367.0)
('website', 3332.0)
('religious-building', 3292.0)
('locomotive', 3225.0)
('diocese', 3153.0)
('martial-artist', 3146.0)
('rail-line', 3139.0)
('governor', 3094.0)
('athlete', 3087.0)
('saint', 3059.0)
('dam', 3005.0)
('figure-skater', 2913.0)
('television-season', 2891.0)
('state-senator', 2864.0)
('hospital', 2834.0)
('vg', 2826.0)
('professional-wrestler', 2818.0)
('golfer', 2800.0)
('record-label', 2757.0)
('thoroughbred-racehorse', 2729.0)
('former-country', 2640.0)
('park', 2636.0)
('comics-creator', 2628.0)
('international-football-competition', 2605.0)
('monarch', 2598.0)
('district', 2502.0)
('shopping-mall', 2502.0)
('given-name', 2493.0)
('football-match', 2481.0)
('pageant-titleholder', 2380.0)
('football-tournament-season', 2344.0)
('mountain-range', 2311.0)
('concert-tour', 2217.0)
('president', 2178.0)
('rugby-team', 2098.0)
('sports-league', 2086.0)
('court-case', 2026.0)
('spaceflight', 2014.0)
('power-station', 2004.0)
('nobility', 1956.0)
('political-post', 1950.0)
('sports-season', 1932.0)
('judge', 1928.0)
('military-award', 1841.0)
('anatomy', 1838.0)
('horseraces', 1836.0)
('racing-driver', 1824.0)
('comic-book-title', 1819.0)
('play', 1772.0)
('secondary-school', 1718.0)
('handball-biography', 1686.0)
('prime-minister', 1661.0)
('criminal', 1646.0)
('constituency', 1641.0)
('china-station', 1629.0)
('cultivar', 1627.0)
('education-in-canada', 1626.0)
('mine', 1612.0)
('lighthouse', 1612.0)
('architect', 1611.0)
('philosopher', 1598.0)
('model', 1591.0)
('union', 1568.0)
('mayor', 1560.0)
('gymnast', 1544.0)
('historic-building', 1530.0)
('rugby-union-biography', 1530.0)
('adult-biography', 1515.0)
('legislature', 1498.0)
('ancient-site', 1498.0)
('school-district', 1487.0)
('soap-character-2', 1480.0)
('civilian-attack', 1462.0)
('short-story', 1447.0)
('theatre', 1432.0)
('football-league', 1414.0)
('bus-transit', 1394.0)
('former-subdivision', 1383.0)
('song-contest-entry', 1382.0)
('comedian', 1303.0)
('law-enforcement-agency', 1300.0)
('publisher', 1281.0)
('cycling-race-report', 1272.0)
('volleyball-player', 1271.0)
('basketball-club', 1269.0)
('musical', 1258.0)
('chess-player', 1252.0)
('hurricane', 1219.0)
('hotel', 1203.0)
('rail-service', 1166.0)
('skier', 1160.0)
('music-festival', 1152.0)
('street', 1135.0)
('radio-show', 1118.0)
('national-football-team', 1102.0)
('soap-character', 1087.0)
('train', 1087.0)
('mobile-phone', 1081.0)
('recurring-event', 1080.0)
('motorcycle-rider', 1079.0)
('protein-family', 1078.0)
('comics-character', 1075.0)
('language-family', 1063.0)
('venue', 1053.0)
('city', 1049.0)
('mountain-pass', 1028.0)
('hockey-team', 1020.0)
('holiday', 1019.0)
('motorcycle', 988.0)
('economist', 987.0)
('wrestling-event', 973.0)
('game', 961.0)
('rfam', 946.0)
('film-awards', 936.0)
('family-name', 934.0)
('prison', 915.0)
('zoo', 889.0)
('waterfall', 855.0)
('senator', 851.0)
('film-festival', 850.0)
('religious-biography', 846.0)
('public-transit', 838.0)
('swimming-event', 828.0)
('restaurant', 820.0)
('library', 814.0)
('racehorse', 811.0)
('body-of-water', 799.0)
('sports-team', 792.0)
('college-football-player', 786.0)
('standard', 777.0)
('engineer', 768.0)
('hiking-trail', 766.0)
('beverage', 757.0)
('mandir', 751.0)
('horseracing-personality', 745.0)
('sport-governing-body', 740.0)
('minister', 724.0)
('cricket-ground', 719.0)
('os', 709.0)
('aircraft-occurrence', 706.0)
('roller-coaster', 703.0)
('cemetery', 703.0)
('individual-golf-tournament', 698.0)
('rail', 689.0)
('speedway-rider', 681.0)
('television-channel', 676.0)
('network', 673.0)
('college-athletics', 673.0)
('cycling-race', 662.0)
('glacier', 661.0)
('poker-player', 657.0)
('convention', 655.0)
('historic-subdivision', 647.0)
('cricket-tournament', 645.0)
('astronaut', 644.0)
('london-station', 631.0)
('football-tournament', 631.0)
('ski-area', 629.0)
('camera', 627.0)
('instrument', 625.0)
('badminton-player', 622.0)
('curler', 620.0)
('curling', 617.0)
('diplomatic-mission', 610.0)
('coat-of-arms', 608.0)
('broadcasting-network', 594.0)
('information-appliance', 594.0)
('college', 593.0)
('official-post', 582.0)
('pro-wrestling-championship', 566.0)
('flag', 557.0)
('treaty', 557.0)
('golf-facility', 551.0)
('terrorist-attack', 551.0)
('programming-language', 544.0)
('brand', 542.0)
('monastery', 540.0)
('amusement-park', 531.0)
('football-official', 530.0)
('darts-player', 529.0)
('oil-field', 527.0)
('country-at-games', 525.0)
('baseball-team', 525.0)
('pro-football-player', 516.0)
('chef', 508.0)
('golf-tournament', 502.0)
('international-ice-hockey-competition', 502.0)
('kibbutz', 497.0)
('historical-event', 494.0)
('serial-killer', 491.0)
('cricket-team', 488.0)
('file-format', 488.0)
('rugby-union-season', 481.0)
('table-tennis-player', 479.0)
('monument', 479.0)
('amateur-wrestler', 474.0)
('ambassador', 474.0)
('sailor', 474.0)
('port', 467.0)
('tunnel', 467.0)
('fashion-designer', 465.0)
('sumo-wrestler', 464.0)
('presenter', 461.0)
('institute', 459.0)
('civil-conflict', 458.0)
('anthem', 458.0)
('country', 457.0)
('observatory', 455.0)
('military-installation', 451.0)
('cave', 445.0)
('dogbreed', 442.0)
('automobile-engine', 437.0)
('war-faction', 436.0)
('alpine-ski-racer', 433.0)
('wrestling-team', 431.0)
('noble', 426.0)
('news-event', 423.0)
('kommune', 422.0)
('graphic-novel', 420.0)
('handball-club', 419.0)
('rail-accident', 418.0)
('government-cabinet', 415.0)
('australian-football-club', 413.0)
('reality-music-competition', 413.0)
('casino', 410.0)
('fraternity', 410.0)
('squash-player', 409.0)
('bus-line', 407.0)
('criminal-organization', 404.0)
('motorsport-championship', 397.0)
('earthquake', 396.0)
('deity', 396.0)
('canal', 391.0)
('windmill', 389.0)
('cricket-tour', 389.0)
('writing-system', 384.0)
('bone', 383.0)
('racing-car', 381.0)
('bishop', 380.0)
('games', 378.0)
('grape-variety', 377.0)
('temple', 377.0)
('wine-region', 375.0)
('snooker-player', 373.0)
('speed-skater', 370.0)
('clergy', 367.0)
('currency', 362.0)
('rugby-football-league-season', 362.0)
('novel-series', 361.0)
('lacrosse-player', 361.0)
('artery', 358.0)
('given-name2', 354.0)
('biodatabase', 353.0)
('airliner-accident', 347.0)
('top-level-domain', 345.0)
('fencer', 342.0)
('comic-strip', 340.0)
('national-basketball-team', 340.0)
('cycling-team', 339.0)
('pharaoh', 336.0)
('orchestra', 333.0)
('tram-network', 333.0)
('athletic-conference', 333.0)
('sailboat-specifications', 331.0)
('roller-derby-league', 328.0)
('nerve', 328.0)
('police-officer', 326.0)
('winery', 324.0)
('medical-person', 323.0)
('peer', 320.0)
('cheese', 313.0)
('scholar', 312.0)
('animal', 311.0)
('horse', 310.0)
('musical-composition', 306.0)
('painting', 300.0)
('horse-race', 300.0)
('book-series', 297.0)
('font', 297.0)
('protein', 294.0)
('sports-rivalry', 285.0)
('guitar-model', 284.0)
('military-memorial', 284.0)
('high-court', 284.0)
('lt-governor', 282.0)
('muscle', 280.0)
('comics-organization', 279.0)
('comics-story-arc', 269.0)
('international-handball-competition', 261.0)
('rocket', 259.0)
('rugby-league-club', 256.0)
('surname', 254.0)
('netball-biography', 253.0)
('political-youth-organization', 252.0)
('castrum', 252.0)
('fire-department', 251.0)
('coin', 248.0)
('event', 248.0)
('volleyball-biography', 244.0)
('rowing-club', 244.0)
('badminton-event', 244.0)
('martial-art', 240.0)
('beauty-pageant', 238.0)
('bodybuilder', 237.0)
('college-sports-rivalry', 237.0)
('sport', 230.0)
('national-military', 229.0)
('esl-club', 228.0)
('color', 228.0)
('dancer', 228.0)
('comet', 228.0)
('go-player', 226.0)
('podcast', 225.0)
('regency', 224.0)
('football-country-season', 223.0)
('animanga-character', 222.0)
('mythical-creature', 222.0)
('song-contest-country', 221.0)
('college-baseball-team', 217.0)
('economy', 217.0)
('tennis-tournament', 214.0)
('wrestling-promotion', 213.0)
('cycling-championship', 212.0)
('individual-darts-tournament', 211.0)
('clan', 211.0)
('law-school', 210.0)
('ski-jumper', 210.0)
('frazione', 207.0)
('artefact', 207.0)
('rally', 205.0)
('computer', 204.0)
('organisation', 202.0)
('family', 201.0)
('field-hockey-player', 200.0)
('hockey', 199.0)
('document', 198.0)
('brewery', 195.0)
('defunct-tennis-tournament', 194.0)
('ligament', 190.0)
('exchange', 183.0)
('first-lady', 182.0)
('racecourse', 182.0)
('valley', 180.0)
('squash-tournament', 178.0)
('pipeline', 176.0)
('tornado-outbreak', 173.0)
('synthesizer', 173.0)
('card-game', 172.0)
('comics-location', 171.0)
('named-horse', 169.0)
('video-game-series', 168.0)
('rugby-tournament', 168.0)
('motor-race', 165.0)
('ecoregion', 165.0)
('flood', 164.0)
('cricket-tournament-main', 164.0)
('artifact', 163.0)
('playwright', 162.0)
('volleyball-club', 162.0)
('sport-club', 159.0)
('laboratory', 158.0)
('patriarch', 157.0)
('occupation', 157.0)
('meteorite', 157.0)
('summit', 156.0)
('dava', 156.0)
('central-bank', 155.0)
('sport-wrestler', 154.0)
('rugby-league-football-competition', 153.0)
('song-contest', 153.0)
('national-volleyball-team', 152.0)
('nuclear-weapons-test', 152.0)
('toy', 150.0)
('squash-championship', 150.0)
('sport-tournament', 148.0)
('skating-event', 148.0)
('spacecraft', 147.0)
('college-football-season', 147.0)
('aviator', 143.0)
('tennis-cup-team', 143.0)
('chess-opening', 142.0)
('residential-college', 140.0)
('rugby-league-season', 139.0)
('poultry-breed', 139.0)
('ice-hockey-award', 138.0)
('winter-storm', 138.0)
('drug', 137.0)
('cricket-club-season', 137.0)
('rocket-stage', 136.0)
('national-handball-team', 135.0)
('house', 135.0)
('wildfire', 135.0)
('country-telephone-plan', 135.0)
('convention-center', 134.0)
('mosque', 134.0)
('theologian', 134.0)
('spy', 132.0)
('trolleybus-system', 132.0)
('nebula', 132.0)
('college-marching-band', 130.0)
('pirate', 130.0)
('surfer', 129.0)
('oilfield', 128.0)
('waterlock', 128.0)
('academic-division', 127.0)
('sport-event', 127.0)
('actor', 127.0)
('telescope', 125.0)
('dog-breed', 125.0)
('polyhedron', 124.0)
('college-ice-hockey-team', 124.0)
('webcomic', 123.0)
('fictional-location', 122.0)
('pinball', 122.0)
('geopolitical-organization', 121.0)
('militant-organization', 121.0)
('speedway-league', 121.0)
('radar', 121.0)
('haplogroup', 120.0)
('seamount', 120.0)
('architectural-practice', 120.0)
('bus-company', 118.0)
('students-union', 117.0)
('college-soccer-team', 117.0)
('college-basketball-team', 116.0)
('theatre-group', 115.0)
('ice-hockey-game', 114.0)
('motorcycle-speedway-team', 112.0)
('caste', 111.0)
('department', 109.0)
('comics-meta-series', 107.0)
('international-baseball-tournament', 107.0)
('photographic-lenses', 107.0)
('national-baseball-team', 107.0)
('state', 106.0)
('governor-general', 106.0)
('heritage-railway', 105.0)
('media-franchise', 105.0)
('motorsport-venue', 104.0)
('water-park', 104.0)
('national-political-convention', 104.0)
('name-module', 103.0)
('opera', 103.0)
('former-monarchy', 103.0)
('college-basketball-tournament', 103.0)
('basketball-player', 102.0)
('referendum', 102.0)
('sheep-breed', 102.0)
('mill-building', 99.0)
('triathlete', 99.0)
('team-golf-tournament', 99.0)
('comic', 98.0)
('rebbe', 96.0)
('climber', 96.0)
('gold-mine', 96.0)
('chancellor', 95.0)
('athletics-race', 95.0)
('sporting-event-organization', 95.0)
('pig-breed', 95.0)
('leadership-election', 95.0)
('equestrian', 94.0)
('murderer', 93.0)
('football-derby', 93.0)
('medieval-text', 92.0)
('banknote', 92.0)
('nonhuman-protein', 92.0)
('hot-spring', 92.0)
('media', 92.0)
('web-browser', 91.0)
('surah', 91.0)
('noble-house', 90.0)
('mass-murderer', 89.0)
('candidate', 89.0)
('constellation', 88.0)
('launch-pad', 87.0)
('biathlete', 86.0)
('premier', 86.0)
('spacecraft-class', 86.0)
('motorway-services', 86.0)
('office-holder', 86.0)
('government-budget', 85.0)
('comics-species', 85.0)
('manhwa', 84.0)
('national-futsal-team', 84.0)
('rugby-league-cup', 84.0)
('power-transmission-line', 84.0)
('national-field-hockey-team', 83.0)
('vice-president', 83.0)
('ballet-company', 82.0)
('fragrance', 82.0)
('bus-accident', 82.0)
('floorball-club', 82.0)
('rugby-league-representative-team', 82.0)
('rugby-league-international-tournament', 82.0)
('sport-horse', 81.0)
('forest', 81.0)
('national-hockey-team', 80.0)
('rocket-engine', 79.0)
('automobile-platform', 79.0)
('pigeonbreed', 79.0)
('rugby-league-football-match', 79.0)
('lymph', 79.0)
('athletics-competition', 78.0)
('basketball-league-season', 77.0)
('product', 76.0)
('attraction', 76.0)
('comics-character-and-title', 76.0)
('field-hockey-club', 75.0)
('stage-production', 75.0)
('computer-virus', 75.0)
('ballet', 74.0)
('food', 74.0)
('poem', 74.0)
('connector', 73.0)
('academic', 73.0)
('parliament', 73.0)
('pipe-band', 72.0)
('summit-meeting', 72.0)
('chess-biography', 71.0)
('monarchy', 71.0)
('research-institute', 71.0)
('snooker-tournament', 70.0)
('academic-conference', 69.0)
('athletics-club', 69.0)
('landform', 69.0)
('sports-club', 68.0)
('tram', 68.0)
('technology-festival', 68.0)
('legislative-session', 67.0)
('aircraft', 67.0)
('townlands', 66.0)
('reality-talent-competition', 66.0)
('electric-vehicle', 66.0)
('college-softball-team', 66.0)
('motorcycle-club', 65.0)
('rugby-league-team-season', 64.0)
('themed-area', 63.0)
('cycling-team-season', 63.0)
('fictional-spacecraft', 63.0)
('tea', 62.0)
('comics-elements', 62.0)
('classical-composer', 62.0)
('song-contest-national-year', 62.0)
('ski-jumping-hill', 62.0)
('laboratory-equipment', 61.0)
('concentration-camp', 61.0)
('comics-team-and-title', 61.0)
('beer', 61.0)
('handball-league', 60.0)
('nhsc', 59.0)
('sculpture', 59.0)
('globular-cluster', 59.0)
('cluster', 58.0)
('lunar-mare', 58.0)
('comics-set-index', 58.0)
('congressional-candidate', 57.0)
('piercing', 57.0)
('disputed-islands', 57.0)
('filesystem', 57.0)
('shinty-club', 56.0)
('software-license', 56.0)
('factory', 56.0)
('superhero', 56.0)
('oil-refinery', 55.0)
('national-sports-federations', 54.0)
('aqueduct-navigable', 54.0)
('cycling-season', 54.0)
('railway-depot', 53.0)
('knot', 53.0)
('accounting-body', 51.0)
('space-telescope', 51.0)
('online-music-service', 51.0)
('boxing-match', 51.0)
('sports-division', 51.0)
('encyclical', 51.0)
('sports-announcer', 51.0)
('algorithm', 51.0)
('hurling-championship', 49.0)
('country-demographics', 49.0)
('tennis-player-season', 49.0)
('manuscript', 49.0)
('pier', 49.0)
('gotra', 49.0)
('diamond', 48.0)
('hurricane-impact', 48.0)
('hut', 48.0)
('college-lacrosse-championship', 48.0)
('tornado-single', 48.0)
('sports-conference', 48.0)
('swimming-venue', 47.0)
('college-lacrosse-team', 47.0)
('goat-breed', 47.0)
('future-infrastructure-project', 47.0)
('professional-inline-hockey-team', 47.0)
('ferry-route', 47.0)
('religion', 47.0)
('rugby-league', 47.0)
('national-motorcycle-speedway-team', 46.0)
('cricket-series-begin', 45.0)
('unit', 45.0)
('choir', 45.0)
('postage-stamp', 45.0)
('koryu', 45.0)
('particle', 45.0)
('interbank-network', 45.0)
('cattle-breed', 44.0)
('university-school', 44.0)
('folk-tale', 44.0)
('project', 44.0)
('international-hockey-competition', 44.0)
('space-agency', 44.0)
('cloud', 44.0)
('programming-block', 44.0)
('national-water-polo-team', 44.0)
('fictional-country', 43.0)
('international-floorball-competition', 43.0)
('cross-country-championships', 43.0)
('speedway-league-season', 43.0)
('operational-plan', 42.0)
('fictional-race', 42.0)
('swimming-meet', 42.0)
('tram-route', 41.0)
('island', 41.0)
('isotope', 41.0)
('rail-company', 41.0)
('water-transit', 41.0)
('interval', 41.0)
('meteorite-subdivision', 40.0)
('snooker-season', 40.0)
('climbing-route', 40.0)
('neuron', 40.0)
('region-symbols', 40.0)
('industrial-process', 39.0)
('bus-route', 39.0)
('ambulance-company', 39.0)
('martial-art-school', 39.0)
('bottled-water', 38.0)
('command-structure', 38.0)
('video-game-character', 37.0)
('comics-in-other-media', 37.0)
('superfund', 36.0)
('fictional-planet', 35.0)
('curling-club', 35.0)
('holocaust-event', 35.0)
('darts-tournament', 35.0)
('housing-project', 35.0)
('martial-art-group', 35.0)
('model-rail-scale', 35.0)
('fictional-artifact', 34.0)
('martyrs', 34.0)
('dance', 33.0)
('college-volleyball-team', 33.0)
('concert', 33.0)
('sailing-yacht', 33.0)
('swimming-pool', 33.0)
('computer-hardware', 33.0)
('sport-supporter-group', 31.0)
('whisky-distillery', 31.0)
('space-station', 31.0)
('military-attack', 31.0)
('swim-team', 31.0)
('bibliographic-database', 30.0)
('oil-spill', 30.0)
('geopolitical-organisation', 30.0)
('international-softball-tournament', 30.0)
('national-roller-hockey-team', 30.0)
('sea', 30.0)
('hockey-league', 30.0)
('custom-computer', 30.0)
('wheelchair-tennis-player', 30.0)
('royal-house', 30.0)
('drums-corps', 29.0)
('national-netball-team', 29.0)
('clothing-item', 29.0)
('limited-overs-final', 29.0)
('school-athletics', 29.0)
('manhua', 29.0)
('civil-servant', 28.0)
('urban-feature', 28.0)
('drug-class', 28.0)
('historian', 28.0)
('aerial-lift-line', 28.0)
('television-family', 27.0)
('typeface', 27.0)
('bus', 27.0)
('tennis-circuit-season', 27.0)
('short-track-speed-skater', 27.0)
('football-association', 27.0)
('athletics-event', 26.0)
('defense-minister', 26.0)
('tool', 26.0)
('speaker', 26.0)
('theatre-festival', 26.0)
('computer-hardware-bus', 26.0)
('sports-award', 26.0)
('examination', 25.0)
('festival', 25.0)
('machinima', 25.0)
('mountaineer', 25.0)
('index', 25.0)
('storm', 25.0)
('water-ride', 25.0)
('gaming-group', 25.0)
('census', 25.0)
('comics-nationality', 25.0)
('military-test-site', 25.0)
('netball-team', 24.0)
('basketball-tournament-at-games', 24.0)
('skyscraper', 24.0)
('calculator', 24.0)
('astronomical-survey', 24.0)
('deputy-prime-minister', 24.0)
('swimming-association', 24.0)
('machine', 23.0)
('electronic-component', 23.0)
('dual-roller-coaster', 23.0)
('livery-company', 23.0)
('horse-breed', 23.0)
('eruption', 23.0)
('basketball-game', 23.0)
('mtgset', 23.0)
('field-hockey-league-season', 23.0)
('olive-cultivar', 22.0)
('rare-stamps', 22.0)
('college-wrestling-team', 22.0)
('photographic-film', 22.0)
('technology-standard', 22.0)
('archaeological-culture', 22.0)
('circus', 22.0)
('storage-medium', 21.0)
('wuxia-fiction-character', 21.0)
('certification-mark', 21.0)
('cricketer-tour-biography', 21.0)
('aqueduct', 21.0)
('presidential-library', 21.0)
('paranormal-term', 20.0)
('advertising', 20.0)
('terrestrial-impact-site', 20.0)
('campground', 20.0)
('yacht-club', 20.0)
('data-structure', 20.0)
('retail-market', 20.0)
('comedy-group', 20.0)
('biography', 19.0)
('holiday-camp', 19.0)
('electricity-sector', 19.0)
('genome', 19.0)
('cycling-championships', 18.0)
('single-nucleotide-polymorphism', 18.0)
('country-geography', 18.0)
('watch', 18.0)
('space-shuttle', 18.0)
('fictional-organisation', 18.0)
('pictish-stone', 18.0)
('standardref', 18.0)
('surf-club', 18.0)
('pretender', 18.0)
('photographic-lens', 17.0)
('musician', 17.0)
('castle', 17.0)
('tribe', 17.0)
('hillclimb-venue', 17.0)
('mancala', 17.0)
('historical-era', 17.0)
('tennis-player-season-2', 17.0)
('continent', 17.0)
('synchronized-skating-team', 17.0)
('basketball-club-season', 17.0)
('polygon', 17.0)
('comics-set-and-title', 16.0)
('rugby-league-team', 16.0)
('distributed-computing-project', 16.0)
('fictional-vehicle', 16.0)
('computer-hardware-generic', 16.0)
('legislative-district', 16.0)
('whitewater-course', 16.0)
('high-school', 15.0)
('shipping-job', 15.0)
('robot', 15.0)
('boat-race', 15.0)
('property-development', 15.0)
('spree-killer', 15.0)
('yoga-school', 15.0)
('journalist', 15.0)
('chipset', 15.0)
('tornado', 14.0)
('tennis-match', 14.0)
('networking-protocol', 14.0)
('artificial-fly', 14.0)
('novel', 14.0)
('spring', 14.0)
('multichoice-referendum', 14.0)
('zone', 14.0)
('county', 13.0)
('vacuum-tube', 13.0)
('baseball-game', 13.0)
('reactor', 13.0)
('tennis-player', 13.0)
('college-football-bowl-game', 13.0)
('football-club-season2', 13.0)
('film-movement', 12.0)
('pope', 12.0)
('historic-area', 12.0)
('sports-centre', 12.0)
('province', 12.0)
('soap-opera-family', 12.0)
('meteor-shower', 12.0)
('actress', 12.0)
('gunpowder-plot', 12.0)
('college-swim-team', 12.0)
('zodiac', 12.0)
('cable', 11.0)
('martial-arts-tournament', 11.0)
('college-tennis-team', 11.0)
('alternative-medicine', 11.0)
('professional-bowler', 11.0)
('softball-team', 11.0)
('climbing-area', 11.0)
('pandemic', 11.0)
('first-minister', 11.0)
('college-track-and-field-team', 11.0)
('fictional-creature', 11.0)
('supernova', 11.0)
('statue', 11.0)
('pelotari', 11.0)
('intangible-heritage', 11.0)
('basketball-official', 11.0)
('bus-station', 10.0)
('college-field-hockey-team', 10.0)
('field-hockey-league', 10.0)
('movie-quote', 10.0)
('desalination-plant', 10.0)
('bridge-type', 10.0)
('business-park', 10.0)
('speedway-grand-prix-event', 10.0)
('hymnal', 10.0)
('cricketer-biography', 10.0)
('colour', 10.0)
('highway-system', 10.0)
('famine', 10.0)
('clothing-type', 10.0)
('college-golf-team', 10.0)
('pseudoscience', 10.0)
('expansion-draft', 10.0)
('pulps-character', 10.0)
('martial-art-form', 9.0)
('mascot', 9.0)
('school-marching-band', 9.0)
('animal-breed', 9.0)
('airline-alliance', 9.0)
('dog-crossbreed', 9.0)
('performing-art', 9.0)
('college-gymnastics-team', 9.0)
('animated-superhero', 9.0)
('docks', 9.0)
('tram-depot', 9.0)
('space-station-module', 9.0)
('computing-standard', 9.0)
('video-game-online-service', 9.0)
('motor-racing-team', 8.0)
('sailing-competition', 8.0)
('integer-sequence', 8.0)
('band', 8.0)
('director', 8.0)
('nrl-club', 8.0)
('bay', 8.0)
('bullfighting-career', 8.0)
('village', 8.0)
('water-buffalo-breed', 8.0)
('protocol', 8.0)
('diving-equipment', 8.0)
('furniture', 8.0)
('college-cross-country-team', 8.0)
('sport-overview', 8.0)
('cycling-hill-climb', 8.0)
('beer-style', 8.0)
('fictional-business', 7.0)
('botanical-product', 7.0)
('heraldic-knot', 7.0)
('mining', 7.0)
('tree', 7.0)
('donkey', 7.0)
('night-vision-device', 7.0)
('television-advert', 7.0)
('deputy-first-minister', 7.0)
('serial-publication', 7.0)
('league-commissioner', 7.0)
('television-show', 7.0)
('railway-line', 7.0)
('literary-genre', 7.0)
('tv', 6.0)
('lens-design', 6.0)
('waterway', 6.0)
('movie-camera', 6.0)
('runestone', 6.0)
('recurring-sailing-competition', 6.0)
('private-school', 6.0)
('executive-government', 6.0)
('singer', 6.0)
('rugby-match', 6.0)
('motorsport-round', 6.0)
('rail-franchise', 6.0)
('combat-robot', 6.0)
('telecommunications-network', 6.0)
('wifi-network', 6.0)
('rink-hockey-club', 5.0)
('movie', 5.0)
('software2', 5.0)
('sports-draft', 5.0)
('town', 5.0)
('cardinal', 5.0)
('military-aviation-unit', 5.0)
('mobile-suit', 5.0)
('gem', 5.0)
('financial-index', 5.0)
('bowl-series', 5.0)
('region', 5.0)
('national-football-team-season', 5.0)
('fictional-element', 5.0)
('photographer', 5.0)
('theater', 5.0)
('fieldbus-protocol', 5.0)
('spacecraft-instrument', 5.0)
('attraction-model', 5.0)
('wargame', 5.0)
('art-movement', 4.0)
('sport-centre', 4.0)
('hotspot-custom', 4.0)
('official-football-team', 4.0)
('phone', 4.0)
('room', 4.0)
('cricket-club', 4.0)
('field-hockey', 4.0)
('equestrian-championships', 4.0)
('dance-company', 4.0)
('comics-artificial-species', 4.0)
('engineering-career', 4.0)
('identifier', 4.0)
('snowboarder', 4.0)
('halftime-show', 4.0)
('peninsulas', 4.0)
('crown', 4.0)
('doll', 4.0)
('sports-series', 4.0)
('peninsula', 4.0)
('garden', 3.0)
('comics-object-and-title', 3.0)
('mobile-network', 3.0)
('author', 3.0)
('hostel', 3.0)
('terrorist-organization', 3.0)
('place', 3.0)
('character-encoding', 3.0)
('presidential-government', 3.0)
('college-rowing-team', 3.0)
('flash-series', 3.0)
('magnetosphere', 3.0)
('rugby-tour', 3.0)
('feature-on-celestial-object', 3.0)
('rebreather', 3.0)
('fictional-ship', 3.0)
('college-track-and-field', 3.0)
('location', 3.0)
('file-system', 3.0)
('comics-species-and-title', 3.0)
('military-operation', 3.0)
('country-deforestation', 2.0)
('year-in-country', 2.0)
('videogame', 2.0)
('tropical-cyclone', 2.0)
('composer', 2.0)
('dark-ride', 2.0)
('goat', 2.0)
('square', 2.0)
('books', 2.0)
('historical-continent', 2.0)
('solar-eclipse', 2.0)
('crater-data', 2.0)
('television-series', 2.0)
('drug-mechanism', 2.0)
('miniseries', 2.0)
('sports-in-region', 2.0)
('knot-details', 2.0)
('performer', 2.0)
('film-director', 2.0)
('football-biography-2', 2.0)
('conference', 2.0)
('controversial-invention', 2.0)
('football-player', 2.0)
('footballer', 2.0)
('political-coalition', 2.0)
('water-supply-and-sanitation', 2.0)
('body-process', 2.0)
('stausee', 2.0)
('urban-development-project', 2.0)
('aus-sport-club', 2.0)
('residency-show', 1.0)
('college-inline-hockey-team', 1.0)
('military', 1.0)
('planetary-system', 1.0)
('missile', 1.0)
('cometary-globule', 1.0)
('caliph', 1.0)
('horse-color', 1.0)
('archaeological-site', 1.0)
('open-cluster', 1.0)
('badminton-team', 1.0)
('television-station', 1.0)
('space-mission', 1.0)
('township', 1.0)
('particular-church', 1.0)
('cartoon-character', 1.0)
('area', 1.0)
('cricket-biography', 1.0)
('mtgblockset', 1.0)
('municipality', 1.0)
('periodical', 1.0)
('publication', 1.0)
('doge', 1.0)
('cartoon', 1.0)
('sports-announcer-details', 1.0)
('paranormal-creature', 1.0)
('cricket-player', 1.0)
('computer-peripheral', 1.0)
('automobile-generation', 1.0)
('film-actor', 1.0)
('graphics-processing-unit', 1.0)
('reservoir', 1.0)
('tabletennis-player', 1.0)
('radio-show2', 1.0)
('internet-video', 1.0)
('ice-hockey-team-season', 1.0)
('country-languages', 1.0)
('railway', 1.0)
('journal-series', 1.0)
('minister-office', 1.0)
('poet', 1.0)
('independent-baseball-team', 1.0)
('church2', 1.0)
('physician', 1.0)
('pyramid', 1.0)
('gunpowder-plotter', 1.0)
('actor-voice', 1.0)
('genetically-modified-organism', 1.0)
('victim', 1.0)
('people', 1.0)
('firearm', 1.0)
('college-ski-team', 1.0)
('faunal-age', 1.0)
('primeval-creature', 1.0)
('visual-artist', 1.0)
('cinema', 1.0)
('military-post', 1.0)
('subdivision', 1.0)

Sample Subgraphs

Import libraries and cleaned up graph as G, show the distribution of subgraphs.


In [1]:
from wikimap import data, stats, graph

In [2]:
import networkx as nx

In [3]:
import matplotlib.pyplot as plt

In [4]:
G = data.read_graph("./wikimap/data-folder/wikimap-live.gpickle")

In [5]:
G.connected_component_statistics(printStats=True)


1 nodes: 1943 (37.32%) groups
2 nodes: 2579 (49.54%) groups
3 nodes: 376 (7.22%) groups
4 nodes: 151 (2.9%) groups
5 nodes: 56 (1.08%) groups
6 nodes: 26 (0.5%) groups
7 nodes: 16 (0.31%) groups
8 nodes: 14 (0.27%) groups
9 nodes: 12 (0.23%) groups
10 nodes: 8 (0.15%) groups
11 nodes: 1 (0.02%) groups
12 nodes: 2 (0.04%) groups
13 nodes: 5 (0.1%) groups
14 nodes: 2 (0.04%) groups
15 nodes: 1 (0.02%) groups
17 nodes: 2 (0.04%) groups
22 nodes: 1 (0.02%) groups
23 nodes: 1 (0.02%) groups
24 nodes: 1 (0.02%) groups
25 nodes: 2 (0.04%) groups
27 nodes: 1 (0.02%) groups
28 nodes: 1 (0.02%) groups
33 nodes: 1 (0.02%) groups
66 nodes: 1 (0.02%) groups
107 nodes: 1 (0.02%) groups
158 nodes: 1 (0.02%) groups
1002 nodes: 1 (0.02%) groups
-----------------------------------------
TOTAL: 11398 nodes in network, 5206 distinct groups

One node groups are basically useless to us, so let's take a look first at a few two node groups.

Note on Graphing

Since matplotlib is such a large dependency, but only necessary for graphing networks, it is not included by default. To graph networks, including those in this notebook, simply install matplotlib:

$ pip install matplotlib

Two node groups


In [6]:
two_node_subgraphs = G.connected_components_with_size(2)

These graphs have a simple structure: just two nodes connected by an edge. Unfortunately, directionality is lost in the above command, but that is okay for our purposes. Let's just see what nodes there are in a few groups.


In [7]:
two_node_subgraphs[0].nodes()


Out[7]:
['wusopen', "us women's open"]

In [8]:
two_node_subgraphs[1].nodes()


Out[8]:
['cartridge weight', 'shell weight']

Hm... let's see what is the rendering state for these nodes.


In [9]:
G.rendering_of_graph_node('wusopen')


Out[9]:
'unrend'

In [10]:
G.rendering_of_graph_node("us women's open")


Out[10]:
'rend'

In [11]:
G.infoboxes_of_pair('wusopen', "us women's open")


Out[11]:
['golfer']

In [12]:
G.rendering_of_graph_node('cartridge weight')


Out[12]:
'unrend'

In [13]:
G.rendering_of_graph_node('shell weight')


Out[13]:
'rend'

In [14]:
G.infoboxes_of_pair('cartridge weight', "shell weight")


Out[14]:
['firearm', 'weapon']

Looks like two node subgraphs are just simple unrend ==> rend mappings. Some are valuable such as the above "cartridge weight" and "shell weight".

Three node subgraphs


In [15]:
three_node_subgraphs = G.connected_components_with_size(3)

In [16]:
three_node_subgraphs[0].nodes()


Out[16]:
['town or city', 'town', 'location town']

In [20]:
nx.draw(three_node_subgraphs[0], with_labels=True)


So "metering" is the hub between "exposure metering" and "share of household metering". BTW, edges are bi-directional directionality is eliminated when selecting a subgraph.


In [21]:
three_node_subgraphs[1].nodes()


Out[21]:
['towers', 'weak struct towers', 'robust struct towers']

In [22]:
three_node_subgraphs[2].nodes()


Out[22]:
['num users', 'users', 'active users']

In [23]:
nx.draw(three_node_subgraphs[2], with_labels=True)



In [24]:
G.infoboxes_of_pair("type fauna", "principal")


Out[24]:
['faunal-age']

In [25]:
G.infoboxes_of_pair("principal", "team principal")


Out[25]:
['motor-racing-team']

In [26]:
three_node_subgraphs[3].nodes()


Out[26]:
['name origin', 'origin of name', 'etymology']

In [27]:
nx.draw(three_node_subgraphs[3], with_labels=True)



In [28]:
G.infoboxes_of_pair("ms", "men's singles")


Out[28]:
['badminton-event']

In [29]:
G.infoboxes_of_pair("country ms", "men's singles")


Out[29]:
['badminton-event']

In [30]:
G.rendering_of_graph_node("country ms")


Out[30]:
'unrend'

In [31]:
G.rendering_of_graph_node("men's singles")


Out[31]:
'rend'

In [32]:
G.rendering_of_graph_node("ms")


Out[32]:
'unrend'

In [33]:
three_node_subgraphs[4].nodes()


Out[33]:
['text', 'web la', 'web en']

In [34]:
three_node_subgraphs[5].nodes()


Out[34]:
['potassium mg', 'k', 'potassium']

In [35]:
three_node_subgraphs[6].nodes()


Out[35]:
['msrp-currency', 'msrp-date', 'msrp']

In [36]:
three_node_subgraphs[7].nodes()


Out[36]:
['provideragency', 'services provided by', 'provider agency']

In [37]:
nx.draw(three_node_subgraphs[7], with_labels=True)



In [40]:
three_node_subgraphs[8].nodes()


Out[40]:
['yearenddoublesranking', 'yearendsinglesranking', 'year-end ranking']

In [41]:
three_node_subgraphs[9].nodes()


Out[41]:
['office name', 'officename', '!!!!!officetype!!!!!!!!!!office_type!!!!!s']

In [42]:
three_node_subgraphs[10].nodes()


Out[42]:
['wimbledondoublesjuniorresult', 'wimbledonjuniorresult', 'wimbledon junior']

In [43]:
nx.draw(three_node_subgraphs[10], with_labels=True)



In [44]:
three_node_subgraphs[11].nodes()


Out[44]:
['orbits per day', 'orbits daily', 'orbits day']

In [45]:
nx.draw(three_node_subgraphs[11], with_labels=True)



In [46]:
G.rendering_of_graph_node("provideragency")


Out[46]:
'unrend'

In [47]:
G.rendering_of_graph_node("services provided by")


Out[47]:
'rend'

In [48]:
G.rendering_of_graph_node("provider agency")


Out[48]:
'unrend'

So "services provided by" is the hub between "provideragency" and "provider agency". Interesting that "provider agency" is unrend -- looks like I actually did a good job of cleaning up nodes!

Four node subgraphs


In [49]:
four_node_subgraphs = G.connected_components_with_size(4)

In [50]:
four_node_subgraphs[0].nodes()


Out[50]:
['team principal', 'type fauna', '!!!!!principal_label!!!!!', 'principal']

In [51]:
four_node_subgraphs[1].nodes()


Out[51]:
['known', 'known for', 'signature wine', 'knownfor']

In [52]:
nx.draw(four_node_subgraphs[1], with_labels=True)



In [53]:
four_node_subgraphs[2].nodes()


Out[53]:
['comic yr', 'comic pub', 'comic yrend', '!!!!!comic!!!!!']

In [54]:
four_node_subgraphs[3].nodes()


Out[54]:
['preview', 'preview date', 'preview version release date', 'preview version']

In [55]:
four_node_subgraphs[4].nodes()


Out[55]:
['thesis year', 'thesis url', 'thesis title', 'thesis']

In [56]:
four_node_subgraphs[5].nodes()


Out[56]:
['number of issues', 'origissues', 'issues', 'controversy']

In [57]:
four_node_subgraphs[6].nodes()


Out[57]:
['security class', 'classifications', 'main classification', 'classification']

In [58]:
four_node_subgraphs[7].nodes()


Out[58]:
['t20umpired', 't20 umpired', 'umpt20debutyr', 'umpt20lastyr']

In [59]:
four_node_subgraphs[8].nodes()


Out[59]:
['ship speed', 'speed', 'average speed', 'operating speed']

In [60]:
four_node_subgraphs[9].nodes()


Out[60]:
['total launches', 'known launches', 'tlaunches', 'launches']

In [61]:
nx.draw(four_node_subgraphs[9], with_labels=True)


Eight node subgraphs


In [62]:
eight_node_subgraphs = G.connected_components_with_size(8)

In [63]:
eight_node_subgraphs[0].nodes()


Out[63]:
['defunct',
 'ceased operation',
 'year folded',
 'deactivated',
 'folded',
 'ceased',
 'yeardeactivated',
 'ceased operations']

In [64]:
nx.draw(eight_node_subgraphs[0], with_labels=True)



In [65]:
eight_node_subgraphs[1].nodes()


Out[65]:
['designation3 date',
 'designation4 date',
 'designation2 date',
 'designated',
 'designated as nhl',
 'designation5 date',
 'designation1 date',
 'designated date']

In [66]:
nx.draw(eight_node_subgraphs[1], with_labels=True)



In [67]:
G.rendering_of_graph_node("identifiers")


Out[67]:
'rend'

In [68]:
G.rendering_of_graph_node("lid")


Out[68]:
'unrend'

In [69]:
G.rendering_of_graph_node("iata")


Out[69]:
'unrend'

In [70]:
G.rendering_of_graph_node("icao")


Out[70]:
'unrend'

In [71]:
G.rendering_of_graph_node("wmo")


Out[71]:
'unrend'

In [72]:
G.rendering_of_graph_node("faa")


Out[72]:
'unrend'

In [73]:
G.rendering_of_graph_node("tc")


Out[73]:
'unrend'

In [74]:
G.rendering_of_graph_node("gps")


Out[74]:
'unrend'

Well at least we know that those non-sense nodes are all unrend! I should probably ignore unrend entirely.


In [75]:
eight_node_subgraphs[2].nodes()


Out[75]:
['current title',
 'reprint',
 'title',
 'championship titles',
 'titles',
 'career title',
 '!!!!!titleyears!!!!!',
 'title on the line']

In [76]:
nx.draw(eight_node_subgraphs[2], with_labels=True)



In [77]:
eight_node_subgraphs[3].nodes()


Out[77]:
['cause',
 'date of death',
 'place of death',
 'death cause',
 'death place',
 'cause of death',
 'death date',
 'died']

In [78]:
nx.draw(eight_node_subgraphs[3], with_labels=True)



In [79]:
G.infoboxes_of_pair("sibling names", "related")


Out[79]:
['bridge-type']

In [80]:
eight_node_subgraphs[4].nodes()


Out[80]:
['awards and prizes',
 'notable awards',
 'awards and prizes and listings',
 'awards won',
 'civilian awards',
 'awards',
 'music awards',
 'ren awards']

In [81]:
nx.draw(eight_node_subgraphs[4], with_labels=True)


Thirteen node subgraphs


In [82]:
thirteen_node_subgraphs = G.connected_components_with_size(13)

In [83]:
thirteen_node_subgraphs[0].nodes()


Out[83]:
['cnrr',
 'nkrr',
 'rr',
 'rrho',
 'rrgye',
 'revised romanization',
 'rra',
 'revisedromanization',
 'rrja',
 'rrph',
 'rrstage',
 'skrr',
 'rrborn']

In [84]:
thirteen_node_subgraphs[1].nodes()


Out[84]:
['listed weight',
 'wet weight',
 'weight',
 'billed weight',
 'weight gain loss',
 'weight footnote',
 'maleweight',
 'current weight divisions',
 'dry weight',
 'rated at',
 'male weight',
 'curb weight',
 'weight g']

In [85]:
nx.draw(thirteen_node_subgraphs[1], with_labels=True)


Fourteen node subgraphs


In [86]:
fourteen_node_subgraphs = G.connected_components_with_size(14)

In [87]:
fourteen_node_subgraphs[0].nodes()


Out[87]:
['Population (!!!!!popyear!!!!!)!!!!!population_footnotes!!!!!',
 'total population',
 'sizepopulation',
 'citizen voting age',
 'population as of',
 'populationdate',
 'Population (!!!!!population_year!!!!!)  * Voting age  * Citizen voting age',
 'voting age',
 'population - total - catholics',
 'catholics percent',
 'size population',
 'size of population',
 'catholics',
 'population']

In [88]:
fourteen_node_subgraphs[1].nodes()


Out[88]:
['prospect league',
 'currentclub',
 '!!!!!league!!!!! team (P) Cur. team Former teams',
 'team members',
 'ru currentteam',
 '!!!!!league!!!!! team',
 'currentteam',
 'currentcoachteam',
 'current team',
 'team',
 'current club',
 'prospect team',
 'notable entrants',
 'former teams']

Fifteen node subgraphs


In [89]:
fifteen_node_subgraphs = G.connected_components_with_size(15)

In [90]:
fifteen_node_subgraphs[0].nodes()


Out[90]:
['later work',
 'other characteristics',
 'other info',
 'primary alcohol by volume',
 'other outcome',
 'laterwork',
 'other-information',
 'othernationalistic',
 'other work',
 'other information',
 'other type fauna',
 'other notes',
 'otherlithology',
 'other',
 'otherwins']

In [91]:
nx.draw(fifteen_node_subgraphs[0], with_labels=True)


33 node subgraph


In [97]:
fourfour_node_subgraphs = G.connected_components_with_size(33)

In [98]:
fourfour_node_subgraphs[0].nodes()


Out[98]:
['members',
 'membership num',
 'highest frequencies',
 'crew members',
 'teamsstart',
 'lifetime',
 'Membership (!!!!!membership_year!!!!!)',
 'staff',
 'teamsfinish',
 'Membership  (!!!!!membership_year!!!!!)',
 'notable members',
 'crews',
 'employee',
 'staff writers',
 'num employees',
 'numemployees',
 'plan membership',
 'num staff',
 'num members',
 'nummembers',
 'notable',
 'membership',
 'members year',
 'membership est',
 'staff writer',
 'current members',
 'notable martyrs',
 '!!!!!members_label!!!!!',
 'employees',
 'memberships',
 'num employees year',
 '!!!!!membership_type!!!!!',
 'number of employees']

In [106]:
nx.draw(fourfour_node_subgraphs[0], with_labels=True)


Largest subgraph


In [107]:
max_size = max(G.connected_component_lengths())

In [108]:
print "Size of largest subgraph: " + str(max_size) + " nodes"


Size of largest subgraph: 1002 nodes

In [109]:
max_graph = G.connected_components_with_size(max_size)[0]

In [110]:
max_graph.nodes()


Out[110]:
['',
 'coach',
 'founder',
 'All-time series (!!!!!league!!!!! only)',
 'serviceyears',
 'first played',
 'parentopening',
 'last tournament',
 'first published in',
 'founded',
 'manager',
 'bus owner',
 'track no',
 'chair',
 'captain',
 'aliases',
 'regulated by',
 'homeworld',
 'birthplace',
 'ship commissioned',
 'womens-tenure',
 'related fields',
 'population note',
 'most recent',
 'date taken',
 'participants',
 'languages',
 'year established',
 'power rating',
 'homepage',
 'competing nations',
 'coor',
 'ship capacity',
 'episode creator',
 'host country',
 'external link',
 'hostcity',
 'life span',
 'literally',
 'coach losses',
 'full name',
 'invent-name',
 'birth rate',
 'key people',
 'last meeting date',
 'formation place',
 'venue',
 'coord ref',
 'no of competitors',
 'spectator capacity',
 'planet coords',
 'wc appearances',
 'founder name',
 'artist',
 'tea time',
 'neighborhood',
 'leaders',
 'disciplines',
 'main venue',
 'implementation language',
 'first built',
 'licensee',
 'construction dates',
 'design',
 'main stadium',
 'synchroname',
 'latest release date',
 'established',
 'parent company',
 'nearest settlement',
 'section',
 'formedmonthday',
 'written by',
 'current',
 'version',
 'femaleheight',
 'capital',
 'international',
 'effective since',
 'first appeared',
 'confirmacion place',
 'area footnotes',
 'location of document',
 'betreiber',
 'local authority',
 'full',
 'national water and sanitation company',
 'latest version',
 'commissioning date',
 'inauguration',
 'founded place',
 'power therm elec',
 'class ab power output',
 'to date',
 'region or state',
 'first release',
 'groups',
 'address',
 'active',
 'decom',
 'invented',
 'Number of !!!!!sport!!!!! clubs',
 'publisher',
 'host city',
 'copyright',
 'notable locations',
 'premiere location',
 'founding location',
 'total power',
 'webcite',
 'planned retirement',
 'last release',
 'goldw',
 'associated people',
 '!!!!!languages_type!!!!!',
 'altname',
 'formation',
 'residents',
 'produced by',
 'winning coach',
 'used by',
 'location country',
 'south website',
 'festival date',
 'fiscal year',
 'names',
 'read online',
 'common name',
 'stateabb',
 'num branches',
 'use',
 'from',
 'regional affiliation',
 'time in space',
 'pref',
 'release dates',
 'sports',
 'current headquarters',
 'head coach',
 'last aired',
 'location-dir',
 'manager visitor',
 'current season',
 'minor teams',
 'leader party',
 'ship name',
 'company',
 'periodculture',
 'studio albums',
 'flag',
 'coords ref',
 'commissioned',
 'current head',
 'paralympicteams',
 'symbol type article',
 'year of formation',
 'sub-confederations',
 'final award',
 'created',
 'first holder',
 'team manager',
 'firstapp',
 'official website',
 'latest meeting',
 'head union',
 'demolition date',
 'inscription',
 'mint website',
 'alternate names',
 'additional names',
 'first ep',
 'zone appearances',
 'author of publication',
 'states',
 'owned by',
 'leader name',
 'divefounded',
 'location city',
 'language footnote',
 'firstappearance',
 'end',
 'competitors women',
 'nearest town',
 'years offered',
 'competition date',
 '!!!!!chairperson_label!!!!!',
 'other name',
 'currentowner',
 'chief executive',
 'writers',
 'notable aliases',
 'construction completed',
 'ancestor',
 'establishment',
 'area of operations',
 'orig lang',
 'wsa year',
 'designer',
 'gov body',
 'blank emblem type',
 'menscoach',
 '<!!!!!Prev!!!!! *',
 'active ingredients',
 'location place',
 'stadium capacity',
 '!!!!!stdtitle!!!!!',
 'lastmo',
 'swimfounded',
 'olympicteams',
 'date of premiere',
 'president',
 'therapeutic use',
 'years of service',
 'wpfounded',
 'introduced',
 'branches',
 'debuting countries',
 'symbol type',
 'first event',
 'start note',
 'dst note',
 'headquarters',
 'coordinates footnotes',
 'alias',
 'alternative designations',
 'fromdate',
 'tenure',
 'official language form',
 'publishing country',
 'began',
 'executed',
 'producer',
 'res surface',
 'ecclesiastical province',
 'language of origin',
 'ended',
 'synonyms',
 'keypeople',
 'managers',
 'ship in service',
 'developer',
 'geolocate',
 'areas affected',
 '01 gold',
 '!!!!!flag_type!!!!!',
 'last service',
 'coordinates',
 'birthname',
 'latd',
 'founding guru',
 'gold troy oz',
 'latitude',
 'date aired',
 '!!!!!coachtitle!!!!!',
 '!!!!!ceotag!!!!!',
 'affected',
 'web address',
 'ceo',
 'parent school',
 'plant commission',
 'start year',
 'location-city',
 'nation',
 'hq location city',
 'created by',
 'online service',
 'most appearances',
 'important associated figures',
 'silver',
 'manager home',
 'decommissioned',
 'name',
 'designed by',
 'place premiered',
 'debut',
 'water source',
 'obverse designer',
 'decommission',
 'alternativa place',
 'original release',
 'area ref',
 'zodiac symbol',
 'hq city',
 'managed by',
 'oscoor',
 'area - maximum',
 'operated by',
 'first service',
 'ship operator',
 'date const',
 'series',
 'producedby',
 'notable characters',
 'operator',
 'web site',
 'year',
 'issuing authority website',
 'education required',
 'coat of arms',
 'stage surface',
 'websitetitle',
 'alt names',
 'releasedate',
 'commission date',
 'currently held by',
 'first flight',
 'first edition',
 'homecaptain',
 'team captain',
 'international affiliation',
 'mens-tenure',
 'operating area',
 'female height',
 'home world',
 'supports',
 'land area',
 'foundation',
 'country-origin',
 'worldsteams',
 'plant lat d',
 '!!!!!capital_type!!!!!',
 'existed',
 '!!!!!resultyears!!!!!',
 'chief exec',
 'launch site',
 'base',
 'date max area',
 'launchdate',
 'poweroutput',
 'epicenter',
 'date closed',
 'arena name',
 'owners',
 'service',
 'language',
 'launch',
 'construction started',
 'prenomen hiero',
 'loctext',
 'region ts',
 'active region',
 'place',
 'coordinates ts',
 'full case name',
 'football stadium',
 'awaycaptain',
 'teammates',
 'first',
 'origin',
 'bus operator',
 'inflation source date',
 'active personnel',
 'owned',
 'class c power output',
 'lat',
 'ship decommissioned',
 'from date',
 'owner',
 '!!!!!country_type!!!!!',
 'common languages',
 'utc',
 'base color',
 'city',
 'management',
 'major teams',
 'introduction',
 'origin lat d',
 'supporting character of',
 'maidenname',
 'origin time',
 'worst case performance',
 'golden hiero',
 'element of stories featuring',
 'convention',
 'pub series',
 'final',
 'south operator',
 'original language',
 'pageurl',
 'transmitter coordinates',
 'latest release',
 'pub date',
 'park',
 'first series',
 'appears in',
 'first date',
 'released',
 'psa year',
 'authors',
 'location signed',
 'translation',
 'prefecture',
 'holder',
 '- abolished',
 'plant name',
 'inventor',
 'aka',
 'last appearance',
 'obverse design',
 'utc offset',
 '* !!!!!leader_title!!!!!',
 'latest release version',
 'orbital period',
 'fullname',
 'ship launched',
 'published',
 'date created',
 'area note',
 'reverse design',
 'decommission date',
 'invented by',
 'topics',
 'locations',
 'varsity teams',
 'largest cities',
 'demolished date',
 'current region',
 'oly appearances',
 'origin coordinates',
 'start date',
 'borough',
 'femaleweight',
 'nextseason',
 'silverw',
 '!!!!!coach_title!!!!!',
 'date of formation',
 'date',
 'built',
 'date of introduction source',
 'available in',
 'fr total population estimate year',
 'official languages',
 'cylinder head alloy',
 'canton',
 'official language in',
 'date-built',
 'country represented',
 'demolished',
 'chairman',
 'ground coordinates',
 'clan affiliations',
 'coachcount',
 'silverm',
 'version date',
 'nahestadt',
 'closing date',
 'province',
 'Average electricity use (!!!!!useyear!!!!!)',
 'branchto',
 'ring name',
 'ship owner',
 'franchise dates',
 'men teams',
 'translated name',
 'max area',
 'os grid ref',
 'also called',
 'hq location country',
 'time zones',
 'printer website',
 'debutmo',
 'final holder',
 'reverse designer',
 'creationdate',
 'synonym',
 'silver troy oz',
 'goldm',
 '!!!!!leader_type!!!!!',
 'url',
 'closed date',
 'totalarea',
 'wpname',
 'women teams',
 '02 silver',
 'date released',
 'total of teams',
 '!!!!!CEO_title!!!!!',
 'location state',
 'formation year',
 'number of teams',
 'first episode',
 'site area',
 'surface area',
 'gold',
 'runemaster',
 'visitor',
 'country or region',
 'country of origin',
 'discovered',
 'availability by country',
 'air dates',
 'headquarters location',
 'first new world',
 'o start',
 'home page',
 '!!!!!mgrtitle!!!!!',
 'availability',
 'seating-capacity',
 'ground',
 'coach ties',
 'transdate',
 'based',
 'nicknames',
 'en alt title',
 'lat degrees',
 'sizeland',
 'inaugurated',
 'athletics nicknames',
 'developers',
 'mens',
 'webpage',
 'names list',
 'region of origin',
 'subsection',
 'alternative name',
 'nation from',
 'member party',
 'coor pinpoint',
 'production company',
 'home venue',
 '!!!!!event_type!!!!!',
 'north operator',
 'settlements',
 'swimname',
 'notable people',
 'first awarded',
 'year active',
 'last new world',
 'clans',
 'years',
 'service branches',
 'closure',
 'no of teams',
 'original state',
 'no of nations',
 'countries',
 'spokesperson',
 'hq state',
 'began operations',
 'coord',
 'teams',
 'sports fielded',
 'place of birth',
 'official language',
 'powerout',
 'release',
 'propellant capacity',
 'home era',
 'creator',
 'official site',
 'available',
 'place of origin',
 'final result',
 'locale',
 'prevfounded',
 'other title',
 'gridreference',
 'end note',
 'mcurrent',
 'lifespan',
 'operated',
 'close',
 'sport',
 'available f',
 'year founded',
 'capacity',
 'languageorigin',
 'lat long',
 'away',
 '!!!!!firstlabel!!!!!',
 'state',
 'coach wins',
 'closed',
 'significant design',
 'author',
 'first aired',
 'discovery',
 '- created',
 'publishers',
 'parent',
 'publishing city',
 'last flight',
 'solar site area',
 '!!!!!region_label!!!!!',
 'tea origin',
 'year of invention',
 'ranked',
 'other designations',
 'design effort',
 'formed month day',
 'group',
 'original owner',
 'cities',
 '!!!!!chairman_label!!!!!',
 'engine date',
 'bronze',
 'competitors men',
 'last',
 'home ground',
 'residence',
 'region',
 'ballpark',
 'formedyear',
 '!!!!!region_type!!!!!',
 'nonfiction topics',
 'home arena',
 'synchrofounded',
 'lyrics date',
 'designations',
 'founded by',
 'coined by',
 'north website',
 'other names',
 'formed year',
 '!!!!!party_label!!!!!',
 'telescope area',
 'date premiered',
 'winning time',
 'appeared',
 'first publication',
 'discontinued',
 'first tournament',
 'ancestor names',
 'venues',
 'leader title',
 'transpublisher',
 'retail availability',
 'cultivar group',
 'arena location',
 'alt name',
 'nationality',
 'certifying agency',
 'Flag of !!!!!common_name!!!!!',
 'parent body',
 'introduction os',
 'configuration',
 '!!!!!chair_label!!!!!',
 'resort',
 'date of birth',
 'built in',
 'instituted',
 'grid reference',
 'current operator',
 'stadium',
 'destination countries',
 'time spent in space',
 '!!!!!ownertitle!!!!!',
 'engine',
 'arena',
 '03 bronze',
 '- coordinates',
 'total site area',
 'park section',
 'final release',
 'formed',
 'dates',
 'founders',
 'west website',
 'date of equipping',
 'also known as',
 'location founded',
 'rescuename',
 'branchfrom',
 'Installed capacity (!!!!!capacityyear!!!!!)',
 'wcurrent',
 'active as of',
 'introduced in',
 "men's coach",
 'appearance',
 'last launch',
 'nomen hiero',
 'captained by',
 'num teams',
 'online services',
 'hq location',
 'country',
 'east os grid reference',
 'areatotal',
 'robust struct built',
 'power output',
 'period in office',
 'west operator',
 'first release date',
 'nations',
 'site',
 'surface',
 'nebty hiero',
 'debuthead',
 'finalists',
 'launch date',
 'plant operator',
 'female weight',
 'publication date',
 'agency',
 'coa',
 'memorial',
 'linename',
 'power consumption',
 'orbitalperiod',
 'original run',
 'prov',
 'track owner',
 'writer',
 'coach year',
 'west os grid reference',
 'party',
 '!!!!!owntitle!!!!!',
 'events',
 'inventors',
 'finish',
 'fire captain',
 'abolition',
 'premiere',
 'appearances',
 'gbgridref',
 'last meeting result',
 '!!!!!captaintitle!!!!!',
 'wpheadcoach',
 'bronzew',
 'divename',
 'producers',
 'pen name',
 'ownership',
 'o end',
 'nickname',
 'official names',
 'cityvillage',
 'available for military service',
 'country ts',
 'swimheadcoach',
 'mostrecent',
 'coordinates note',
 'introduction date',
 'lat d',
 'first launch',
 'launched',
 'organization',
 'creators',
 'location ts',
 'commodore',
 'based in',
 'geographic coordinates',
 'introdate',
 'mastersname',
 'major cities',
 '!!!!!leadershiptitle!!!!!',
 'end point',
 'related scientific disciplines',
 'coach start',
 'premiere date',
 'home stadium',
 'to',
 'original work',
 'web',
 'webapp',
 'retirement',
 'south os grid reference',
 'first year',
 'common nicknames',
 'seating capacity',
 'year of completion',
 'othernames',
 'source',
 'construction began',
 'yearsactive',
 'parents',
 'location',
 'origlanguage',
 'full title',
 'usage',
 'burned area',
 'osgraw',
 '!!!!!teams_label!!!!!',
 'associations',
 'Medals Ranked !!!!!rank!!!!!th',
 'time zone',
 '!!!!!prev!!!!! (!!!!!prev_no!!!!!)',
 'prime mover',
 'broadcast from',
 'period',
 'commonname',
 'years active',
 'coached by',
 'last date',
 'birth',
 'original source',
 'birth date',
 'current owner',
 'final edition',
 'trans capacity',
 'wind site area',
 'birth place',
 'first season',
 'release date',
 'homescore',
 'version release date',
 'data sources',
 'coords',
 'headcoach',
 '!!!!!home!!!!!',
 'principal venue',
 'prog lang',
 'birth name',
 '!!!!!chrtitle!!!!!',
 'burn time',
 'date established',
 'political party',
 'earliest publications',
 'original author',
 'people',
 'yearconstruction',
 'footage',
 'journey time',
 'spill date',
 'last event',
 'parent unit',
 'founded at',
 'effective region',
 'broadcast area',
 'launch location',
 'duration',
 'home',
 'orbit period',
 'nations participating',
 'born',
 'tea names',
 'first recorded',
 'national side',
 'altoffsp',
 '!!!!!head_name!!!!!',
 'debutyr',
 'ancestor arts',
 'discontinuation place',
 'mass number',
 'monarchy abolished',
 'discontinuation',
 'mastersfounded',
 'geographic origin',
 'rescuefounded',
 'leader',
 'first award',
 'programming language',
 'destruction date',
 'flight origin',
 'bauzeit',
 'plant decommission',
 'power',
 'coach end',
 'name, symbol',
 'transponder capacity',
 'chiefexec',
 'ground capacity',
 'leadership',
 'Coat of arms of !!!!!name!!!!! Coat of arms',
 'arena coord',
 'lyrics',
 'parent range',
 'diveheadcoach',
 'managing agency',
 'confederations',
 'flourished',
 'origins',
 'east website',
 'creators series',
 'internationalspan',
 'ended service',
 'discontinuation date',
 'exposure latitude',
 'primemover',
 'bronzem',
 'end year',
 'year start',
 'womens',
 'first flight aircraft',
 'coordinates ref',
 'memorials',
 'winner-origin',
 'end date',
 'discovery date',
 'collecting area',
 '!!!!!coordinate_title!!!!!',
 'first primeval',
 'stop',
 'termini',
 'artists',
 'earlier spellings',
 '!!!!!head_label!!!!!',
 'in service',
 'alternative names',
 'synchroheadcoach',
 'important events',
 'alt title',
 'area',
 'museum',
 'apps',
 'stateterritory',
 'tour captain',
 'captains',
 'start',
 'current affiliation date',
 'planet landing',
 'lage',
 'area of origin',
 'last primeval',
 'engine power',
 'geographical range',
 'lang',
 'head',
 'only',
 'continents',
 'inauguration date',
 'start point',
 'first release version',
 'east operator',
 'founding year',
 'construction',
 'alternate name',
 'engine decommissioned',
 'mastersheadcoach',
 'os grid reference',
 'first appearance',
 '!!!!!gold!!!!!',
 'geographic distribution',
 'initial release',
 'language count',
 'final winner',
 'morenicks',
 'official',
 'clubs',
 'city of license',
 'arearank',
 'rescueheadcoach',
 'founded date',
 'current-owner',
 'first released',
 'branched from',
 'sc operator',
 'year end',
 'telescope name',
 '!!!!!countrytag!!!!!',
 'administrative centre',
 'todate',
 'gridref',
 'source of milk',
 'major events',
 'imam',
 'location region',
 'alter ego',
 'developed by',
 'written in',
 'timezone',
 'year closed',
 'constructed',
 'firstmo',
 'parent institution',
 'covered area',
 'current owners',
 'dates of operation',
 'competitors',
 'event',
 'website',
 'nick',
 'weak struct built',
 'teammanager',
 'north os grid reference',
 'nat',
 '!!!!!years!!!!!',
 'construct start',
 '!!!!!leader_title!!!!!',
 'ship nickname',
 '- Previous season !!!!!prevseason!!!!!',
 'whytetype',
 'pseudonym',
 'elements',
 'final date',
 'yesorretiredyear',
 'early forms',
 'symbol',
 '!!!!!lastevent!!!!!',
 'main producers',
 '- origin',
 'altnames',
 'osgridref',
 'year built',
 'est',
 'formation date',
 'invented year',
 'wordname',
 'organelle-year',
 'munwebpage',
 'present location',
 'coaching career',
 'opm protein',
 'chairperson',
 'studio',
 'point of origin',
 'folk tale name',
 'first flight date',
 '- !!!!!prevseason!!!!!',
 'countryflag',
 ...]

In [111]:
nx.draw(max_graph, with_labels=True)


Enough said.